learning restricted boltzmann machine
Learning Restricted Boltzmann Machines with Sparse Latent Variables
Restricted Boltzmann Machines (RBMs) are a common family of undirected graphical models with latent variables. An RBM is described by a bipartite graph, with all observed variables in one layer and all latent variables in the other. We consider the task of learning an RBM given samples generated according to it. The best algorithms for this task currently have time complexity $\tilde{O}(n^2)$ for ferromagnetic RBMs (i.e., with attractive potentials) but $\tilde{O}(n^d)$ for general RBMs, where $n$ is the number of observed variables and $d$ is the maximum degree of a latent variable. Let the \textit{MRF neighborhood} of an observed variable be its neighborhood in the Markov Random Field of the marginal distribution of the observed variables. In this paper, we give an algorithm for learning general RBMs with time complexity $\tilde{O}(n^{2^s+1})$, where $s$ is the maximum number of latent variables connected to the MRF neighborhood of an observed variable. This is an improvement when $s < \log_2 (d-1)$, which corresponds to RBMs with sparse latent variables. Furthermore, we give a version of this learning algorithm that recovers a model with small prediction error and whose sample complexity is independent of the minimum potential in the Markov Random Field of the observed variables. This is of interest because the sample complexity of current algorithms scales with the inverse of the minimum potential, which cannot be controlled in terms of natural properties of the RBM.
Review for NeurIPS paper: Learning Restricted Boltzmann Machines with Sparse Latent Variables
Additional Feedback: I found the phrase "few latent variables" to be confusing, since this is not really what the authors mean. In particular, their bounds use "s", which is bounded by the number of latent variables connected to any variable in the Markov blanket of Xi in the marginal distribution over the visible X. This was not clear, in my view, until the formal definition (150-155). Since this style of RBM is not well studied, the practical significance of learning rates is not clear. The type of model is intuitively appealing (many models use "local" latent variables or encourage sparseness), and perhaps the work would spur application of these styles of RBM, but it's difficult to say that this is improving the theory for a well-established problem of importance.
Review for NeurIPS paper: Learning Restricted Boltzmann Machines with Sparse Latent Variables
This paper presents an algorithm for provably learning RBMs when each visible node is connected to a small number of hiddens, presenting bounds that improve over previous results in a specific regime. While reviewers agree the results appear sound, the paper has done little to convince the reviewers of the significance of the regime, and reviewer requests for additional intuition were not satisfied effectively in the author response. In total, though, the work appears novel and sound, and consensus is in favor of acceptance. I would strongly encourage the authors to try to address R2's questions. I will have to look at the paper again, but my intuition was that s should bound the size of the Markov blanket, which should lead to better than O(n d) scaling, and they don't seem to have addressed this except to say it doesn't.")
Learning Restricted Boltzmann Machines with Sparse Latent Variables
Restricted Boltzmann Machines (RBMs) are a common family of undirected graphical models with latent variables. An RBM is described by a bipartite graph, with all observed variables in one layer and all latent variables in the other. We consider the task of learning an RBM given samples generated according to it. The best algorithms for this task currently have time complexity \tilde{O}(n 2) for ferromagnetic RBMs (i.e., with attractive potentials) but \tilde{O}(n d) for general RBMs, where n is the number of observed variables and d is the maximum degree of a latent variable. Let the \textit{MRF neighborhood} of an observed variable be its neighborhood in the Markov Random Field of the marginal distribution of the observed variables.
Learning Restricted Boltzmann Machines with Arbitrary External Fields
We study the problem of learning graphical models with latent variables. We give the first algorithm for learning locally consistent (ferromagnetic or antiferromagnetic) Restricted Boltzmann Machines (or RBMs) with {\em arbitrary} external fields. Our algorithm has optimal dependence on dimension in the sample complexity and run time however it suffers from a sub-optimal dependency on the underlying parameters of the RBM. Prior results have been established only for {\em ferromagnetic} RBMs with {\em consistent} external fields (signs must be same)\cite{bresler2018learning}. The proposed algorithm strongly relies on the concavity of magnetization which does not hold in our setting. We show the following key structural property: even in the presence of arbitrary external field, for any two observed nodes that share a common latent neighbor, the covariance is high. This enables us to design a simple greedy algorithm that maximizes covariance to iteratively build the neighborhood of each vertex.